Goto

Collaborating Authors

 cooperative heterogeneous deep reinforcement learning


Cooperative Heterogeneous Deep Reinforcement Learning

Neural Information Processing Systems

Numerous deep reinforcement learning agents have been proposed, and each of them has its strengths and flaws. In this work, we present a Cooperative Heterogeneous Deep Reinforcement Learning (CHDRL) framework that can learn a policy by integrating the advantages of heterogeneous agents. Specifically, we propose a cooperative learning framework that classifies heterogeneous agents into two classes: global agents and local agents. Global agents are off-policy agents that can utilize experiences from the other agents. Local agents are either on-policy agents or population-based evolutionary algorithms (EAs) agents that can explore the local area effectively. We employ global agents, which are sample-efficient, to guide the learning of local agents so that local agents can benefit from the sample-efficient agents and simultaneously maintain their advantages, e.g., stability. Global agents also benefit from effective local searches. Experimental studies on a range of continuous control tasks from the Mujoco benchmark show that CHDRL achieves better performance compared with state-of-the-art baselines.

  agent, cooperative heterogeneous deep reinforcement learning, name change, (4 more...)


Review for NeurIPS paper: Cooperative Heterogeneous Deep Reinforcement Learning

Neural Information Processing Systems

The exact mechanic of the policy transfer between different algorithm is not given. Given the content, I may assume that "transfer" means a simple copying of the parameters, but I remain unsure. When augmenting the experience buffer with other algorithm, it would be nice to clarify why it does (not) introduce any bias in the data. It seems that the different parts of the framework could be replaced by a different way of "tinkering" with a algorithm or its hyperparameters. E.g., the auxiliary on-policy algorithms are here mainly for exploration, but the exploration of the main off-policy algorithm itself can be easily controlled and I suspect it can, with the right setting, work as good as the given complicated framework. The global and local experience buffer seems more like a hack.


Review for NeurIPS paper: Cooperative Heterogeneous Deep Reinforcement Learning

Neural Information Processing Systems

Following the rebuttals, all four reviewers agreed that this paper should be accepted. While there are remaining questions around the hyperparameters (and performance relative to other methods), and computational cost, this is an interesting and novel line of work. The authors are encouraged to proofread the paper thoroughly and address the issues raised by the reviewers.

  cooperative heterogeneous deep reinforcement learning, neurips paper, reviewer

Cooperative Heterogeneous Deep Reinforcement Learning

Neural Information Processing Systems

Numerous deep reinforcement learning agents have been proposed, and each of them has its strengths and flaws. In this work, we present a Cooperative Heterogeneous Deep Reinforcement Learning (CHDRL) framework that can learn a policy by integrating the advantages of heterogeneous agents. Specifically, we propose a cooperative learning framework that classifies heterogeneous agents into two classes: global agents and local agents. Global agents are off-policy agents that can utilize experiences from the other agents. Local agents are either on-policy agents or population-based evolutionary algorithms (EAs) agents that can explore the local area effectively.

  agent, cooperative heterogeneous deep reinforcement learning, local agent, (1 more...)

Cooperative Heterogeneous Deep Reinforcement Learning

Zheng, Han, Wei, Pengfei, Jiang, Jing, Long, Guodong, Lu, Qinghua, Zhang, Chengqi

arXiv.org Artificial Intelligence

Numerous deep reinforcement learning agents have been proposed, and each of them has its strengths and flaws. In this work, we present a Cooperative Heterogeneous Deep Reinforcement Learning (CHDRL) framework that can learn a policy by integrating the advantages of heterogeneous agents. Specifically, we propose a cooperative learning framework that classifies heterogeneous agents into two classes: global agents and local agents. Global agents are off-policy agents that can utilize experiences from the other agents. Local agents are either on-policy agents or population-based evolutionary algorithms (EAs) agents that can explore the local area effectively. We employ global agents, which are sample-efficient, to guide the learning of local agents so that local agents can benefit from sample-efficient agents and simultaneously maintain their advantages, e.g., stability. Global agents also benefit from effective local searches. Experimental studies on a range of continuous control tasks from the Mujoco benchmark show that CHDRL achieves better performance compared with state-of-the-art baselines.